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RIEMANNIAN OPTIMIZATION 

ON TENSOR PRODUCTS OF GRASSMANN MANIFOLDS: 

APPLICATIONS TO GENERALIZED RAYLEIGH-QUOTIENTS 

O. CURTEFtt, G. DIRRt, AND U. HELMKEt 

Abstract. We introduce a generalized Rayleigh-quotient p^ on the tensor product of Grassman- 
nians Gr®''(m, n) enabling a unified approach to well-known optimization tasks from different areas 
of numerical linear algebra, such as best low-rank approximations of tensors (data compression), geo- 
metric measures of entanglement (quantum computing) and subspace clustering (image processing). 
We briefly discuss the geometry of the constraint set Gr®'^(m, n), we compute the Ricmannian gra- 
^ ' dient of py^, we characterize its critical points and prove that they are gencrically non-degenerated. 

Q ' Moreover, we derive an explicit necessary condition for the non-degeneracy of the Hessian. Finally, 

h-^ ^ we present two intrinsic methods for optimizing px — a- Ncwton-likc and a conjugated gradient — 

^— ( , and compare our algorithms tailored to the above-mentioned applications with established ones from 

^~s ■ the literature. 
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C^ , 1. Introduction. The present paper addresses a constrained optimization prob- 

lem, subsuming and extending optimization tasks which arise in various areas of ap- 
pUcations such as (i) low-rank tensor approximation problems from signal processing 
and data compression, (ii) geometric measures of pure state entanglement from quan- 
tum computing, (iii) subspace reconstruction problems from image processing and 
^^ ' (iv) combinatorial problems. 

l/^ , The problem can be stated as follows: Given a collection of integer pairs {nij, Uj) 

QO ' with 1 < rrij < rij for j = 1, . . . ,r and a Hcrmitian N x N matrix A with N := 

~~^. , nin2 ■ ■ -rir, find the global maximizer of the trace function P t-^ tr{AP). Here, P 

lO ' is restricted to the set of all Hcrmitian projectors P : C^ -^ C^ of rank M := 

^^ , TO1TO2 • • • rrir, which can be represented as a tensor product P := Pi ® • ■ • ® Pr of 

Hcrmitian projectors Pj : C"j — >■ C"^ of rank rrij. Thus, one is faced with the 
constrained optimization task 
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k> ; maxtr(AP) subject to P e Gr'^''(m, n), (1.1) 

H 

Cd ' where Gr (m, n) denotes the set of all Hcrmitian projectors of the above tensor 

type and (m, n) is a shortcut for ((mi, rii), . . . , (m^, rir)) . We will see that it makes 

sense to call the above objective function P i— !> tr(y4P) =: pa(P) the generalized 

Rayleigh-quotient of A with respect to the partitioning (m, n). 

To the best of the authors' knowledge, problem (jl.ip has not been discussed in 

the literature in this general setting. However, depending on the structure of A as 

well as on the choice of (ni,n), problem (jl.ip relates to well-known numerical linear 

algebra issues: 

(i) For Hcrmitian matrices of rank-1, i.e. A = vv'^ , it reduces to a best low-rank 
approximation problem for the tensor A G C"ixn2x-xnr -^^jjich satisfies v = vec(^), 
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cf. |21[ 128] . Classical application areas of such low-rank approximations can be found 
in statistics, signal processing and data compression j¥l [2ni[^l31j . 

(ii) A recent application in quantum computing plays a central role in charac- 
terizing and quantifying pure state entanglement. Here, the distance of a pure state 
(tensor) to the set of all product states (rank-1 tensors) provides a geometric measure 
for entanglement |6l [23l [34] . 

(iii) Moreover, the challenging task of recovering subspaces of possibly different 
dimensions from noisy data — known as subspace detection or subspace clustering 
problem in computer vision and image processing j33j — can also be cast into the 
above setting. More precisely, for an appropriately chosen Hermitian matrix A the 
subspace clustering task can be characterized by problem (jl.ip in the sense that for 
unperturbed data the global minima of the generalized Rayleigh-quotient are in unique 
correspondence with the sought subspaces. Numerical experiments in Section 4 sup- 
port that even for noisy data the proposed optimization yields reliable approximations 
of the unperturbed subspaces. 

(iv) In [3] a certain class of combinatorial problems are recast as optimization 
problems for trace functions on the special unitary group. For the case when A 
is a diagonal matrix, optimization task (|l.ip is a generalization of the applications 
mentioned in [3j. 

Our solution to problem (|1.1|) is based on the fact that the constraint set Gr® (m, n) 
can be equipped with a Riemannian submanifold structure. This admits the use of 
techniques from Riemannian optimization — a rather new approach towards con- 
strained optimization exploiting the geometrical structure of the constraint set in 
order to develop numerical algorithms [I] I14[ 132] . In particular, we pursue two ap- 
proaches: a Newton and a conjugated gradient method. 

On a Riemannian manifold, the intrinsic Newton method is usually described 
by means of the Levi-Civita connection, performing iterations along geodesies, see 
[9j [29] . A more general approach via local coordinates was initiated by Shub in [27] 
and further discussed in [T] [13]. Here, we follow the ideas in [13] and use a pair 
of local parametrizations — normal coordinates for the push-forward and QR-type 
coordinates for the pull-back — satisfying an additional compatibility condition to 
preserve quadratic convergence. Thus we obtain an intrinsically defined version of 
the classical Newton algorithm with some computational flexibility. Nevertheless, for 
high-dimensional problems its iterations are expensive, both in terms of computa- 
tional complexity and memory requirements. Therefore, we alternatively propose a 
conjugated gradient method, which has the advantage of algorithmic simplicity at a 
satisfactory convergence rate. In doing so, we suggest to replace the global line-search 
of the classical conjugated gradient method by a one-dimensional Newton-step, which 
yields a better convergence behavior near stationary points than the commonly used 
Armijo-rule. 

As mentioned earlier, depending on the structure of A, the above-specified prob- 
lems (i), (ii), (iii) and (iv) are particular cases of the optimization task ()1.1|) . For the 
best low-rank approximation of a tensor the standard numerical approach is an alter- 
nating least-squares algorithm, known as higher-order orthogonal iteration (HOOI) 
[2Tj . Recently, several new methods also exploiting the geometric structure of the 
problem have been published. Newton algorithms have been proposed in [8] [18], 
quasi- Newton methods in [28], conjugated gradient and trust region methods in [T7] . 
For high-dimensional tensors, all Riemannian Newton algorithms manifest similar 
problems: too high computational complexity and memory requirements. Our conju- 
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gated gradient method is however, a good candidate to solve large scale problems. It 
exhibits locally a good convergence behavior, comparable to that of the quasi-Ncwton 
methods in j28j at much lower computational costs, which considerably reduces the 
necessary CPU time. 

For the problem of estimating a mixture of linear subspaccs from sampled data 
points, cf. (iii), our numerical approach is an efficient alternative to the classical ones: 
ad-hoc type methods such as K-subspace algorithms |16j . or probabilistic methods 
using a Maximum Likelihood framework for the estimation |30j . 

The paper is organized as follows. In Section 2, we familiarize the reader with the 
basic ingredients of Riemannian optimization. In particular, we address the follow- 
ing topics: the Riemannian submanifold structure of the constraint set Gr®''(m, n), 
its isometry to the r-fold cartesian product of Grassmannians, geodesies and parallel 
transport and the computation of the intrinsic gradient and Hessian for smooth ob- 
jective functions. Section 3 is dedicated to the problem of optimizing the generalized 
Rayleigh-quotient pA^ including also a detailed discussion on its relation to problems 
(i), (ii), (iii) and (iv). Moreover, an analogy to the classical Rayleigh-quotient is also 
the subject of this section. We compute the gradient and the Hessian of the general- 
ized Rayleigh-quotient and derive critical point conditions. We end the section with a 
result on the generic non-degeneracy of its critical points. In Section 4, a Newton-like 
and a conjugated gradient algorithm as well as numerical simulations tailored to the 
previously mentioned applications are given. 

2. Preliminaries. 

2.1. Riemannian structure of Gr®'"(m,n). We start our study on the op- 
timization task (|l.ip with a brief summary on the necessary notations and basic 
concepts. 

Let f)er„ be the set of all Hermitian nxn matrices A, i.e. A G C"^" with ^4^ = A, 
where A^ refers to the conjugate transpose of A. Moreover, let SU„ be the Lie group 
of all special unitary matrices and su„ its Lie-algebra, i.e. G G SUn if and only if 
Q^Q = In, det6 = 1 and, respectively, fl £ su„ if and only if O^ = —fl and tr(J7) = 0. 
The Grassmannian, 

Gr™,„ := {P e C"^" \P = P^ = P\ tr(P) = m}, (2.1) 

is the set of all rank m Hermitian projection operators of C". It is a smooth and 
compact submanifold of f)er„ with real dimension 2m{n — to), whose tangent space at 
P is given by 

TpGr,„,„ = {[P, ^]:=Pn-nP\ne su„}, (2.2) 

cf. |13| . Hence, every element P G Gr^ „ and every tangent vector ^ G TpGr,„ „ can 
be written as 

p = en„, ,„et and e = ec™,„e^ (2.3) 

where H„i_„ is the standard projector of rank m acting on C" and Cm,™ denotes a 
tangent vector in the corresponding tangent space, i.e. 
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, ^^ G C™^^"-™). (2.4) 
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Whenever the vahies of m and n are clear from the context, we will use the shortcuts 
n and (^. With respect to the Riemannian metric induced by the Frobenius inner 
product of ()er„, the Grassmannian Gr,„_„ is a Riemannian submanifold and the unique 
orthogonal projector onto TpGr™ „ is given by 

ad%X=[P,[P,X]l Xe[)er„. (2.5) 

We define the r—fold tensor product of Grassmannians Gr^ . „ . , j = 1, . . . , r as 
the set 

Gr®'-(m, n) := {Pi ® ■ • • ® P, I P, G Gr„^.,„^. , j = 1, . . . , r} (2.6) 

of all rank-M Hermitian projectors of C^ with M := mim2 ■ ■ ■ rrir and N := nin2 ■ • ■ n^ 
which can be represented as a Kronecker product Pi • ■ • P^. Here, (m, n) stands 
for the niulti index 

(m, n) := Mmi, rii), (m2, 712), . . . , (m^, rir)) . (2.7) 

Then, Gr (m, n) can be naturally equipped with a submanifold structure as the 
following result shows. 

Proposition 2.1. The r—fold tensor product of Grassmannians Gr®'"(m, n) is 

r 

a smooth and compact submanifold of f)erjv of real dimension 2 > rni{ni — rrii). 
Proof. We consider the following smooth action 

(J : SU(n) X f)er^ -^ tier^v, (0, Y) ^ @Y&\ 
of the compact Lie group 

SU(n) ■.= {@:=Qi®---®Qr\ Qj e SU„J C SUat. (2.8) 

Let X G f)er^ be of the form X ;= Hi CE) • • • (g) H^, where Hj denotes the standard 
projector in Gr,„^.^„^.. Then, the orbit 0(X) := {@X&^\ G SU(n)} of X coincides 
with Gr®''(m, n). By [T3] (pp. 44-46) we conclude that the r—fold tensor product 
of Grassmannians is a smooth and compact submanifold of f)er^. Moreover, 0{X) = 
SU(n)/Stab(X), where the stabilizer subgroup oi X is given by 



Stab(X) := {© G SU(n) | &X& = X} 

^1 



{0 G SU(n) I OjUjO] = Uj, j = 1, . . . ,r}. 



It follows easily that the dimension of Stab(X) is T^ [m^ + (rii— rri^)^ — 1] and therefore. 



1=1 



dim (Gr®''(m, n)) = dim SU(n) - dim Stab(X) = 2 V TOj(nj - rrij) 






is the dimension of the r—fold tensor product of Grassmannians. D 

Remark 2.2. (a) Let V<S^W denote the tensor product of finite dimensional vec- 
tor spaces V and W , cf. fIR[W^ and let Xi^Y : V®W — > V"(X)W^ be the tensor product 
of X £ End(F) and Y G End(W^), given by v ® w ^^ Xv Yw, for all v £ V and 
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w (^ W. Moreover, let By and Bw be bases ofV andW , respectively. Then, the matrix 
representation of X ®Y with respect to the product basis {v ®w \v & By, w G B-^r} 
ofV®W is given by the Kronecker product of the matrix representations of A and B 
with respect to By and Bw ■ This clarifies the relation between the "abstract " tensor 
product of linear maps and the Kronecker product of matrices and justifies the term 
"tensor product" of Grassmannians when we refer to Gr (m, n). 
(&) It is a well-known fact that the Grassmannian Gr,„_„ is diffeomorphic to the Grass- 
mann manifold Grass„i_„ of all m^ dimensional subspaces of C-"" , cf. \14^ . Therefore, 
Gr,„j_„j ®Grm2_„2 is diffeomorphic to 

{Vi (8) V2 I Fi e Grass„j^„j, V2 £ Grass^^^nJ C GrassM^w, (2.9) 

where M :— mim2 and N :~ nin2. 

Both items (a) and (b) readily generalize to an arbitrary number of Grassmannians. 

We conclude this subsection by pointing out an isometry between the r— fold 
tensor product of Grassmannians Gr (m, n) and the direct r—fold product of Grass- 
mannians 

Gr^'-(m, n) := {(Pi, . . . , P,) | P, e Gr,„^.,„^., j^l,...,r}. (2.10) 

The vector spaces f)er^ and f)ec„^ x • ■ • x fier„^ endowed with the inner products 

{X,Y):^tr{XY) (2.11) 

and 

((Xi, . . . , Xr), (Fi, . . . , Yr)) := tr(Xiri) + ■ • ■ + tr(X,i;), (2.12) 

induce a Riemannian submanifold structure on Gr®''(m, n) and Gr^''(m, n), respec- 
tively. 

Proposition 2.3. The map 

(p:Gr''''(m,n) ^Gr®''(m,n) , (Pj, . . . , P^) h^ Pi ® • • • ® P^ (2.13) 

is a diffeomorphism between Gr^'^(m, n) and Gr (m, n). Moreover, ip is a global 
Riemannian isometry when the right-hand side of i2.12\) is replaced by 

Ml tr(Xiri) + ---+Mr tT{XrYr), (2.14) 

r 

with Mj := I I ruk, for j — 1, . . . ,r. 

k=l, k^j 

Note that the isometry between Gr^'^(ni, n) and Gr'^'^(m, n) is very special, as 
in general the map 

[)cr„j X • • ■ X f)er„^ -> (lec^, {Xi, . . . ,Xr) ^ XiiSi ■ ■ ■ <S> Xr (2.15) 

fails even to be injective. For the proof of Proposition 12.31 we refer to the Appendix. 
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2.2. Geodesies and parallel transport. It is well-known that every Rie- 
mannian manifold M carries a unique Riemannian or Levi-Civita connection V, 
e.g. [2 [UJ [32] ■ By means of V, one defines parallel transport and geodesies as 
follows. Let 1 1—)- X{t) be a vector field along a curve 7 on A^, i.e. X(t) G T^(t)A^ for 
all t G K. Then, X is defined to be parallel along 7 if 

V^^t)X{t) = (2.16) 

for all f e M. Given ^ e T-y(o)-'^! there exists a unique parallel vector field X along 7 
such that X{Q) ~ ^ and the vector X{t) G Tj(t)Ai is called the parallel transport of ^ 
to T-y(t)A^ along 7. In particular, 7 is called a geodesic on Al, if 7 is parallel along 7. 
For the Grassmann manifold Gr^n, the curve t i-> 7(f) = e^'^^'^lPe*'^'^! de- 
scribes the geodesic through P E Gr„i.„ in direction ^ e TpGrm_„, i.e. 7(i) satisfies 
equation (|2.16p with initial conditions 7(0) = P and 7(0) = £,. Similarly, it can be 
verified that the parallel transport of 77 S TpGr^^n to T.y(t)GTm,n along the geodesic 
through P in direction ^ is given by 77 i-> e~*^^'-^^rie*^^'-^y These notions can be 
straight-forward generalized to the direct product of Grassmannians Gr'^'"(ni, n). 

2.3. The Riemannian gradient and Hessian. First, let us recall that the 
Riemannian gradient at P G A^ of a smooth objective function / : A^ — > M on a 
Riemannian manifold A^ is defined as the unique tangent vector grad/(P) G TpAi 
satisfying 

d/(P)(6-(grad/(P),e) (2.17) 

for all ^ G TpAl, where d/(P) denotes the differential (tangent map) of / at P. 
Moreover, if V is the Levi-Civita connection on Ad, then the Riemannian Hessian of 
/ at P is the hnear map Hy (P) : Tp A^ — ;• TpM defined by 

H/(P)e-V5grad/(P), (2.18) 

for all ^ G TpA^. Now, if A^ is a submanifold of a vector space V, then ()2.17p and 
(|2.18|) simplify as follows. Let / and X be smooth extensions of / and of the vector 
field grad/, respectively. Then, 

grad/(P) = 7rp(V/(P)), llf{P)^ ^ 7rp{DX{P)^), (2.19) 

where irp is the orthogonal projection onto TpA^ and V/ denotes the standard gra- 
dient of / on V. 

For the generalized Rayleigh-quotient pA on Gr ' (m, n), explicit formulas of the 
gradient and Hessian will be given in Section 3.3. 

3. The generalized Rayleigh-quotient. Let Gr®'^(ni, n) be the r— fold tensor 
product of Grassmannians with (m, n) as in (|2.7p and let A G [)er^, N = nin2 ■ • ■ n^. 
In the following, we analyze the constrained optimization problem 

max tr(^P), (3.1) 

PGGr»'-(m,n) 

which comprises problems from different areas, such as multilinear low-rank approx- 
imations of a tensor, geometric measures of entanglement, subspace clustering and 
combinatorial optimization. These applications are naturally stated on a tensor prod- 
uct space. However, for the special case of the Grassmann manifold they can be 
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reformulated on a direet product spaee. To this purpose, we define tlie generalized 
Rayleigh- quotient of the matrix A as 

PA :Gr^''(m,n)^M, ^^(Pi, . . . , P^) := tr (a(Pi ® • • • ® P^)). (3.2) 

Based on the isometry between Gr®'^(m, n) and Gr^'"(m, n), we can rewrite problem 
p.ip as an optimization task for pA 

max pA{Pi,...,Pr)- (3.3) 

(Pi,....Pr)eGrX'-(m,n) 

In general this is not the case, as we have already pointed out in p.lSp . 

The term "generalized Rayleigh-quotient" is justified, since for r = 1 we obtain 
the classical Rayleigh-quotient pa{P) — ti{AP). In the sequel we want to point out 
some similarities and differences between the generalized and the classical Rayleigh- 
quotient. It is well known that under the assumption that there is a spectral gap 
between the eigenvalues oi A G l)tVj^, there is a unique maximizer and a unique mini- 
mizer of the classical Rayleigh-quotient of A. Unfortunately, this is no longer the case 
for the generalized Rayleigh-quotient pA- Global maximizers and global minimizers 
exist since the generalized Rayleigh-quotient is defined on a compact manifold, but 
unlike the classical case, it admits also local extrema as the following example shows. 
For the case when A is of rank one we refer to Example 3 in [21] . 

Example 3.1. Let A = diag(Ai, A2, A3, A4) G f|er4 be a diagonal matrix with 
A2 > A3 > A4 > Ai and Pj*, Pj* G Gri.2 of the form 



^1 



1 




and P2* 




1 



(3.4) 



The maximum of pA is obvious less or equal to A2. Since pa{Pi ■, P2) = A2, we have 
(P*,P2*) as the global maximizer of pA- From p.33p it follows that all (Pi,P2) G 
Gri^2xGri^2 with Pi and P2 diagonal, are critical points of p a- In particular {P2 , Pi) 
is a critical point of pA with pa{P2 , Pi) = A3 < A2. Moreover, one can check by 
computing the Hessian of pA at (PiiPi) j see (|3.39p . that (PiyP*) is actually a local 
maximizer of pA- Comparative to the classical Rayleigh-quotient, this strange behavior 
results from the fact that not all A x A permutation matrices are of the form Qi 02, 
with 61, 62 G SU2. 

While for the classical Rayleigh-quotient one knows that the maximizer and min- 
imizer are orthogonal projectors onto the space spanned by the eigenvectors corre- 
sponding to the largest and, respectively, smallest eigenvalues of A, it is difficult to 
provide an analog characterization for the global extrema of the generalized Rayleigh- 
quotient for an arbitrary matrix A. But, for particular A and r such a characterization 
is possible. 

(a) If r = 2 and A is of rank one, i.e. A = vec(r)vec(r)t, with Y G C"l^"^ then 
the generalized Rayleigh-quotient can be rewritten as 

Pa{PuP2) - tr[vec(r)vec(r)^(Pi ® P2)] = tr(r^PirP2). (3.5) 

Under the assumption that Y has full rank and distinct singular values there exist one 
maximizer and one minimizer. The maximizer (Pi*,P2*) G Gr^ (m, n) oi pA is given 
by the orthogonal projectors onto the space spanned by the m^, := min{7TT,i, 7712} left, 
respective right singular vectors corresponding to the largest m^ singular values. Sim- 
ilar for the minimizer, the singular vectors corresponding to the smallest m^ singular 
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values. 

(b) If r is arbitrary and A diagonalizable via a transformation of SU(n) = {Qi ® ■ • • (8) 
Qr I 0j G SU„ }, then we can assume without loss of generality that A is diagonal. 
Moreover, if A can be written as Ai (g) • • • g) A,., with Aj diagonal, which is always 
the case when A = Ai (E) ■ ■ ■ (^ Ar, Aj € f)er„ . , then the generalized Rayleigh-quotient 
becomes a product of r decoupled classical Rayleigh-quotients. Hence, there is one 
maximizer and one minimizer. However, there is a dramatic change if A cannot be 
written as a Kronecker product of diagonal matrices. In this case pA has also lo- 
cal extrema, as Example 13.11 shows. From p.33p one can immediately formulate the 
following critical point characterization. 

Proposition 3.2. Let A e ijcvj^ be diagonal. Then, {Pi,...,Pr) G Gr^'X^ij") 
is a critical point of pA if and only if Pj are permutations of the standard projectors 
Uj, for all j = l,...,r. 

3.1. Applications. There is a wide range of applications for problem (|3.3p in 
areas such as signal processing, computer vision and quantum information. We briefly 
illustrate the broad potential of p.3p by four examples. 

3.1.1. Best multilinear rank-(TOi, . . . , rur) tensor approximation. The prob- 
lem of best approximation of a tensor by a tensor of lower rank is important in areas 
such as statistics, signal processing and pattern recognition. Unlike in the matrix 
case, there are several rank concepts for a higher order tensor, [5T1[55]. For the scope 
of this paper, we focus on the multilinear rank case. 

A finite dimensional complex tensor A of order r is an element of a tensor product 
Vi ® ■ ■ ■ ^Vr, where Vi, . . . , 14- are complex vector spaces with dim Vj = Uj. Such 
an element can have various representations, a common one is the description as an 
r— way array, i.e. after a choice of bases for Vi, . . . ,Vr, the tensor A is identified with 
[a»i...v]ri=w.=i e C"l^"^^■■■^"^ see e.g. ^. The j-th way of the array is referred 
to as the j-th mode of A. A matrix X G C^^x"^ acts on a tensor j\^ ^ C"^yn2x-xn^ 
via mode—j multiplication Xj, i.e. 

[Ji y-j yy )ii...ij_j^kiij-)-i...ir ^^ / ^ flii...ij_ifc2ij + i...ir*^fclfc2 I I''-"/ 

fe2=l 

cf. [201 [28]. 

It is always possible to rearrange the elements of A along one or, more general, 
several modes such that they form a matrix. Let ^i, . . . , /^ and ci, . . . , Cp be ordered 
subsets of 1, . . . , r such that {li, . . . , /g}U{ci, . . . , Cp} = {1, . . . , r}. Moreover, consider 
the products Nk := m^^^ ■■■ni^, N'j, := ric^^^ • • • ricj, , for fc = 0, . . . , g - 1 and k = 
0, . . . ,p— 1, respectively. Then, the matrix unfolding of A along (/i, . . . , Iq) is a matrix 
^{ii,....i ) of size iVo X Nq such that the element in position (ii, . . . , v) of ^ moves to 
position {s,t) in An^i \, where 

q~l p-l 

■'*-*', +E(*'--l)^fc and t:=i,^+J2iic,-l)Ni (3.7) 

fe=l k=l 

As an example, for a third order tensor A G C^^^^^ we obtain the following matrix 
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unfoldings as in [SD] 



Ai 



(1) 



am aii2 0121 0122 
1211 0212 0221 0222 



A 



(2) 



^111 aii2 a2ii 0212 
0.121 0,122 0221 0222 



1(3) 



am 0121 0211 0221 
0112 0122 0212 0222 



The multilinear rank of ^ G 



^""^ is the r— tuple {mi, . . . , m,,) such that 



ii = rank Ati\ , 



, rur 



rank A 



(r)- 



(3.8) 



To refer to the muhihnear rank of A we wiU use the notation rank-(?7ii, . . . , m^) or 
rank A = (mi, . . . ,mr). Given a tensor A G C"ix-x'v^ .^^g g^j.g interested in finding 
the best rank-(77ii, . . . , rrir) approximation of A, i.e. 



min \\A-B\\ 

rank(6)<(Tni ,...,?n^) 



(3.9) 



Here, ||^|[ is the Frobenius norm of a tensor, i.e. ||^P — (^,.4.) with 



{A,B) =vcc{A)'<Ycc{B) 



E 



2l ,....ir — l 



-^ll ...t-r^li ...Ir • 



(3.10) 



Here, vec(^) refers to the matrix unfolding A(i ,,) G C . In the matrix case, the 
solution of the optimization problem p.9p is given by a truncated SVD, cf. Eckart- 
Young theorem [7]. However, for the higher-order case, there is no equivalent of the 
Eckart- Young theorem. According to the Tucker decomposition [31] or the higher or- 
der singular value decomposition (HOSVD) |20) . any rank-(?7ii, . . . , rrir) tensor can be 



written as a product of a core tensor S and r Stiefel matrices Xi G C" 



XrG 



I.e. 



B^S XiXiX2--- XrXr, X^-Xj = /^^ , j = l,...,r. 
Thus, solving p.9p is equivalent to solving the maximization problem 



max WAX-iXi Xo- ■ ■ X-r Xr\\\ 
Xi,...,X^ 



(3.11) 



with X'-Xj — Irrij, j — 1, ...,?', see e.g. [5]. Using vec— operation and Kronecker 
product language, one has 



vec(^ Xi Xi X2 • • ■ Xr Xr) — vec(^)^(Xi 



^Xr 



(3.12) 



According to p.lOp and the properties of the trace function, the best multilinear 
rank-(?7i,i, . . . , rur) approximation problem becomes 



max tr A{Pi 

(Pi,...,Pr)eGrX'-(m,n) 



'Pr) 



(3.13) 



with A = vec(^)vec(^)t and Pj = X^x], j = 1, . . . , 
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3.1.2. A geometric measure of entanglement. The task of characterizing 
and quantifying entanglement is a central theme in quantum information theory. 
There exist various ways to measure the difference between entangled and product 
states. Here, we discuss a geometric measure of entanglement, which is given by 
the Euclidean distance of z e C^ with ||z|| = 1 to the set of all product states 
V = {xi (» ■ ■ ■ «> Xr \ Xj e C^^ , \\xj\\ = 1}, i.e. 

<5e(z) :=min||2~xf . (3.14) 

xev 

Since any minimizer of Se is also a maximizer of 

max |z^(a;i (8) • • • (8) Xr)!, (3.15) 

and vice versa, computing the entanglement measure (|3.14p is equivalent to solving 



max tr M(Pi (g) • • • (g) Pr) , (3.16) 

(Pi,...,P^)eGrX'-(m,n) V / 

with A = zz^ and Pi = xix\^ ■ ■ ■ ,Pr = Xrx\.. Note that (|3.16p actually constitutes a 
best rank— (1, . . . , 1) tensor approximation problem [Hj. 

3.1.3. Subspace clustering. Subspace segmentation is a fundamental problem 
in many applications in computer vision (e.g. image segmentation) and image pro- 
cessing (e.g. image representation and compression). The problem of clustering data 
lying on multiple subspaces of different dimensions can be stated as follows: 

Given a set of data points X = {xi e ]R"}^^j^ which lie approximately in r > 1 
distinct subspaces Sk of dimension d^, ^ l£ dk < n, identify the subspaces Sk without 
knowing in advance which points belong to which subspace. 

Every dk dimensional subspace Sk C M" can be defined as the kernel of a rank 
TTife = n — dk orthogonal projector Pk of M"'' , with nk = n as 

Sk = {x e M" I Pkx = 0}. (3.17) 

r 

Therefore, any point x G U Sk satisfies 

|iPix||-||P2x|j---||P.x||=0, (3.18) 

which is equivalent to 

tr(a::x^Pi) tr(a:;2:^P2) ■ • ■ tr{xx^ Pr) = tr ({xx^ (g) • ■ • ® xx^){Pi ® • ■ • (g P^) ] = 0. 

(3.19) 
Thus, the problem of recovering the subspaces Sk from the data points X can be 
treated as the following optimization task: 



mm 



y"nil^'=2;zf = min tr (a(Pi ^ • • • (g P,)) , (3.20) 

withP:= {Pi,...,Pr) and 

L 

A:=^xixJ (S)---<E)XixJ. (3.21) 



1=1 
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Wc mention that here we have used the same notation Gr^'^(m,n) to refer to the 
direct r— fold product of real Grassmannians. 

For best multihnear rank tensor approximation and subspace clustering applica- 
tions, numerical experiments are presented at the end of Section 4. 

3.1.4. A combinatorial problem. Let A = {^jk)lL'ik=i ^^ ^ given array of 
positive real numbers and let mi < rii, 1112 < ^2 be fixed. Find mi columns and 7712 
rows such that the sum of the corresponding entries \jk is maximal, i.e. solve the 
combinatorial maximization problem 

max max > A.fe. (3.22) 

JC{l,...,«2}KC{l,...,ni} f^ ■' 

We can permute mi columns and ?7i2 rows of A by right and left multiplication with 
permutations of the standard projectors Hi and 112, respectively. Hence, problem 
p.22p is solved by finding permutation matrices ui and tT2 which maximize: 

^(n,,An,j„, (3.23) 

where '^- is the sum over all entries and Hai '■= crJllicJi, 11^2 '■= crjn2cr2- The sum 
in (|3.23p can be written as 

^(n,,An,j,y = ^ ( (n,, ® n,jvec(A) J = trM(n,, «)n,j), (3.24) 

where A := diag(vec(A)). The last equality in p. 241) holds since Hai (8)11^2 is diagonal, 
too. According to Proposition [221 we have the following equivalence 



max tr A(n„, «> H^J = max tr A(Pi (g> P2) ] ■ (3.25) 

'^U'^2 V / (Pi,P2)eGrX2(m,n) V / 

Hence, we can embed the combinatorial maximization problem (|3.22l) into our con- 
tinuous optimization task p.3p . The generalization of (|3.22l) to A being an arbitrary 
multi-array is straight-forward. 

Problems of this type arise in multi-decision processes such as the following. As- 
sume that a company has ni branches and each branch produces 712 goods. If Xjk 
denotes the gain of the j— th branch with the fc— th good, then one could be interested 
to reduce the number of producers and goods to rrii and 7712 , respectively, which give 
maximum benefit. 

3.2. Riemannian optimization. We continue our investigation of problem 
p.3p by computing the gradient and the Hessian of pA- In the following lemma 
we establish multilinear maps ^a,j, which will enable us to derive clear expressions 
for the gradient and the Hessian of pa- 
Lemma 3.3. Let A G \)CV[^ and {Xi, . . . ,Xr) £ f)ec„^ x • • • x f)cr„^. Then, for all 
j = 1, . . . , r there exists a unique map "^aj '■ f)er„^ x • • • x f)er„^ -^ C"^ ^"^ such that 

tr (^A{Xi ■ • • XjZ • • ■ ® Xr)j = tr (^^a,j{Xi, ..., Xr)Z^ (3.26) 
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holds for all Z e C"^^"^. In particular, one has 

tr (a{Xi ® • • • ® Xr)] = tr ( *a,i(^«i , ^2, • ■ • , Xr)Xi 



(3.27) 
trf *^^^(Xi,...,X^_i,/„jX/ 



Moreover, for A := Ai (E) • ■ • ® Ar the maps ^a,j exhibit the explicit form 

*A,j(^i,...,X,)= [ II triAkXkUAjXj. (3.28) 

Proof. Fix J and consider the linear functional 

Z h^ \a{Z) := tr iA{Xi ® ■ ■ ■ (g> XjZ ®---® Xr) 

By the Riesz representation theorem, there exists a unique Bj e cijX'b such that 
\a{Z) = tr {B-jZ) for all Z e C"^^"^. Therefore, the map *A,i is given by 
(Xi, . . . ,Xr) H> ^ A,j{Xi, . . . ,Xr) '■= Bj. It is Straightforward to show that ^aj is 
multilinear in Xi, . . . ,Xr. Now, choosing Z := Xj and Xj := /„ in p.26p immedi- 
ately yields p.27p . Moreover, p.28p follows from the trace equality 

tr (^1X1 (g) • ■ • ® AjXjZ (g) ■ • ■ ArXr) = ( YY tr(^fe-'^fe) ) tr(AjXjZ). 

Thus the proof of Lemma 13.31 is complete. D 

Remark 3.4. The linear maps '^ a,j constructed in the above proof are almost 
identical to the so-called partial trace operators — a well-known concept from multi- 
linear algebra and quantum mechanics J^. 

Next, we show how to compute "^AjiXi, . . . , Xr) for given {Xi, . . . , Xr) G ^^''^m ^ 
• • • X ()er„ if A is not a pure tensor product Ai ® ■ ■ ■ ® Ar. 

Lemma 3.5. Let A e fjer^y and {Xi,. . . ,Xr) e ()er„j x ••• x f)ec„^. Then, the 
{s,t)— position of^A,j{Xi, . . . ,Xr) G C"^^"' is given by 

y e^^ g) ■ • • ej (g) ■ • • (g) eJ^A{Xi g) • ■ • g) Xr)ei^ g) • • • g) e* g) • ■ • g) e^^, (3.29) 

1=1,.. .,r 

where {e^j }"Lj denotes the standard basis of C"' . 

Proof. Let 1 < s,t < Uj. Then, the element in the {s,t) position of the matrix 
*Aj(^i, . . . , Xr) is given by 

eJ^AjiXi, . . .,Xr)Ct = tli^A.jiXi, . . .,Xr)CtcJ 

= tr ( A{Xi g) • ■ • g) XjeteJ g) • • • g) X^ 



trM(Xi g)---g)Xr)(/„i g)---g)etej (g • • • g) /„ J j . 
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Hence, p.29l) follows from the identity /„j = ^ <2i,cJ . D 

Remark 3.6. Let A e f)ec^ and {Xi, . . . ,Xr) £ t)er„^ x • • ■ x f)ec„^. A straight- 
forward consequence of the identity 

■(SZ(i,---®Xr)] =trM(Xi ® •••®Z+® •••®X^) j, (3.30) 



tr A{Xi' 



for all Z G C"^ ^""^ , shows that '^a.j {Xi ,...,/„.,..., Xr) is Hermitian. For simplic- 
ity of writing, whenever (Pi, . . . , Pr) € Gr^''(m, n) is understood from the context, 
we use the following shortcut 

^■:=vl/^j(Pi,...,/„^.,...,P,). (3.31) 

Now, we can give an explicit formula for the Riemannian gradient of pA and derive 
necessary and sufficient critical point conditions. 

Theorem 3.7. Let A e [)er^, P := (Pi, . . . ,Pr) £ Gr^''(m, n) and let pA he the 
generalized Rayleigh- quotient on Gr^'^(m, n). Then, one has the following: 

(i) The gradient of p^ at P with respect to the Riemannian metric l\2.12^ is 

gradpA(P) = (adp^li, . . . , adp^X) . (3.32) 

(ii) The critical points of pA on Gr^'^(m, n) are characterized by 

[P„1,]=0 (3.33) 

i.e. Pj , j — I, . . . ,r is the orthogonal projector onto an nij^ dimensional invariant 
subspace of Aj. 

Proof (z) Fix P := (Pi, . . . , P,.) G Gr^''(m, n) and let pA denote the canonical 
smooth extension of pA to ()er„ x • ■ • x her„ . Then, 

r r 

Dpa{P){X) - ^tr (a(Pi ® • • • (8)Xj ® • • • ® Pr)) = ^tr(IjXj), (3.34) 

for all X := (Xi, . . . ,Xr) £ [)er„^ x ••• x f)er„_,. From (|2.12p . we obtain that the 
gradient of pA at P is given by Vpa{P) — {Ai, . . . , Ar). Thus, according to (|2.5p and 



grad pAiP) = (ad^^ Ai, . . . , ad^^ A,j . (3.35) 

{ii) P := (Pi, . . . , Pr) G Gr^''(ni, n) is a critical point of pA iff grad PAiP) = 0. This 
is equivalent to 

P,[P„A,] = [P„I,]P„ (3.36) 

for all j = 1 , . . . , r. By multiplying (j3.36p once from the left with Pj and once from 
the right with Pj, we obtain that PjAj = PjAjPj and AjPj = PjAjPj. Hence, the 
conclusion [Pj, ^j] = holds for all j = 1, ... , r. D 

As a consequence of Theorem 13. 7[ we immediately obtain the following necessary 
and sufficient critical point condition. 
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Corollary 3.8. Let A e [)er^, P := (Pi, ■ ■ ■ ,Pr) e Gr^''(m, n) and let 6^ € 



SU„. be such that Q-PjQj — II,-, where Hj is the standard projector in Gim n • We 



write 

r ^' $"' 

(3.37) 



e]I,e, 



3 3 



J 3 



with '^j e ()er,„., *" S [}er„._„j., anrf *"' G C™^^'^"^"™^^ T/ien P is a critical point 
of Pa if and only if 

*f = 0, (3.38) 

for all j = l,...,r. 

For the rest of this section we are concerned with the computation of the Rie- 
mannian Hessian of pA and also with its non-degeneracy at critical points. 

Theorem 3.9. Let A e [)er^ and P := (Pi,...,Pr.) G Gr^''(m, n). Then, the 
Riemannian Hessian of pA at P is the unique self-adjoint operator 



Hp^(P) : TpGr><''(m,n) ^TpGr^''(m,n), 
e := (Ci, . . . , ^.) ^ Hp, (P)(0 := (Hi(0, • . • , H,(0 

defined by 



(3.39) 



H,(0 := -adp^.ad;j^^, + Y. ad|^*^j(A, ■•■ ,^n, ,...,&,..., P.), (3.40) 

k=l,k^3 

where Aj is the shortcut for ^A.j (Pi , ■ ■ ■ , Ln , ■ ■ ■ , Pr) ■ 

Proof. Let {Xi, . . . ,Xr) denote a smooth extension of grad pA to f)er„^ x- • •x[)er„^. 
According to p.32p . we can choose P n- Xj{P) — ad^p Aj. Then, 

DXj{P){X) = adx, adp^. A^ + adp^ adx, Aj 

+ E ad?,^*A.,(Pi,...,/„^.,...,Xfc,...,P,), ^^'^^^ 

for all P := (Pi , . . . , P^) and X := (Xi , . . . , X^) in [)er„^ x • • • x f)er„^ . Notice that, 
the derivative of the linear map Pk M- "^a.j (Pi , ■ ' ' i In ,- ■ ■ , Pk, ■ ■ ■ , Pr) in direction 
Xk e tier„^ (fc # j) is *A,j(Pi, . . . , /„^ , . . . , Xfe, . . . , P/). Applying ^^ and (piB . 
the Riemannian Hessian of p^ at P £ Gr^'^(ni, n) is given by p.39p and p.40p . Here, 
we have used the following two facts: 
(i) Clearly, ad^^j is skew-hermitian and hence 

- adp^^ad^^j = adp^^ ad^^ Aj (3.42) 

is in the tangent space Tp.Gr,„.^„. for all ^j £ Tp.Grm ,n ■ 

(ii) A straightforward computation shows that ad^ adp. Aj is in the orthogonal com- 
plement of Tp.Grm.^„. and hence 

adp^.ad^^adp^.4j =0 (3.43) 



OPTIMIZATION ON TENSOR PRODUCTS OF GRASSMANNIANS 15 

foralUj GTp^.Gr,„^.^„^..D 

By restricting the tangent vectors {£,i,---,^r) £ TpGr^'^(ni, n) to the vectors 
of the form (0, . . . , ^j, . . . , 0), it fohows immediately a necessary condition for the 
non-degeneracy of the Hessian at local extrema. 

Theorem 3.10. Let A e fjer^, and P e Gr^''(m, n) be a local maximizer (local 
minimizer) of pA- IfHpj^{P) is non-degenerate, then for all j = 1, . . . ,r the equality 

a(*;-)n(T(*;') = 0, (3.44) 

holds with ^P' and ^" as in ^3.37^ . Here, cr(^) denotes the spectrum of X . 

Remark 3.11. In the case when A e h^^N ^'^^ ^^ diagonalized by elements in 
SU(n) = {01 • ■ • ® 0r I 0j G SUnj}, condition 's AA^ is also sufficient for the 
nondegeneracy of the Hessian of pA at local extrema. 

In the remaining part of the section we derive a genericity statement concerning 
the critical points of the generalized Rayleigh-quotient. The result is a straightforward 
consequence of the parametric transversality theorem [15]. Let V, Ai, JV he smooth 
manifolds and F -.Y x M -^ Af a smooth map. Moreover, let T(^a,p)F : V x TpM — )■ 
Tf(a,p)A/' denote the tangent map of F at {A,P) £ V x M. We say that F is 
transversal to a submanifold S G Af and write i^ (ti S* if 

ImT(^,P)F + Tf{a.p)S = Tf{a.p)-^, (3.45) 

for all {A,P) G F~^{S). Then, the parametric transversality theorem states the 
following. 

Theorem 3.12. (JWj) Let V, Ai, J\f be smooth manifolds and S a closed sub- 
manifold of Af. Let F : V X A4 ^ Af be a smooth map, let A £ V and define 
FA-.Ai^Af, Fa{P) := F{A, P). IfF(hS, then the set 

{AeV \FA(hS} (3.46) 

is open and dense. 

Now, let fA ■ Ai ^i' M. he a, smooth function depending on a parameter A £ V 
and consider the map 

F -.V xAi^ T*M, F{A, P) := dfA{P), (3.47) 

where T*A^ is the cotangent bundle of Ad and d/^(P) denotes the differential of /^ 
at P £ A4. With these notations, our genericity result reads as follows. 

Theorem 3.13. Let AI , V and F be as above and let S be the image of the zero 
section in T*A^. If F (h S then for a generic A €Y the critical points of the smooth 
function /^ : Al — > K. are non- degenerate. 

Proof. Fix A eV and define 

Fa:M^ T*M, Fa{P) := F{A, P) (3.48) 

From the Transversality Theorem 13.121 it follows that the set 

R:={AeV\FA^S} (3.49) 

is open and dense in y if i^ rh S". In the following, we will prove that Fa fh 5" is 
equivalent to the fact that the Hessian of /a is non-degenerate in the critical points. 
This will prove the theorem. 
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First, notice that Pc G F^^{S) if and only ii Pc & A4 is a. critical point of Ja- 
Therefore, the transversality condition 

ImTp^F^ + Tj.^(P,)5 = Tp^(^p^)T*M, (3.50) 

is equivalent to 

ImTp^FA+ToS' = ToT*X. (3.51) 

To rewrite this condition (|3.5ip in local coordinates, let (p : U ^ W C Tp^Ad be a 
chart on an open subset U C J^ around Pc such that (p~^{0) ~ P^ and £'(^"^(0) = id. 
Then define 

Ja ■■= Ja o V3-1 : M^ ^ R. (3.52) 

Moreover, ip induces a chart i/' : tt~^{U) -^ W x Tp M C Tp^M x Tp M around 
^A(Pc)via 

^{^) = [x,[D^-^{x)Y{^)), x:=(po7r(7), (3.53) 



Here, tt : T*M. -^ M refers to the natural projection and {Dip ^{x))*{^) :— 7 
Dip-^{x). Thus, for 



o 



FA:=^poFAOip-^ -.W -^W xT*P^M (3.54) 

one has Fa{x) = {x,dfA{x)). Since transversality of Fa to S is preserved in local 
coordinates, p.5ip is equivalent to 

ImDFA(O) + Tp^A^ X {0} = Tp^TW x Tp^TW. (3.55) 

Then DFa{0) = (id,rf2/^(o)) yields that ([XTO)) is fulfilled if and only if (fjA{^) is 
nonsingular. Finally, the conclusion follows form the identity Hess/^ (Pc) = <PfA{^) 
which is satisfied due to the fact that Pc is a critical point and Z?(^~^(0) = id. 
Here, Hess/^ (Pc) denotes the Hessian form corresponding to the Hessian operator 
via Hess/4(Pc)(x,y) = (lif,^{Pc)x,y) for all x,y e Tp^M. D 

For the generalized Rayleigh-quotient, we obtain the following result. 

Corollary 3.14. The critical points of the generalized Rayleigh-quotient are 
generically non-degenerate. 

Proof. Set Ai := Gr^'^(m, n), V := f)er^. For the simplicity, we will identify the 
cotangent bundle T*A^ with the tangent bundle TA^ and work with the map 

F:Y xM^ TM, {A, P) ^ grad/9A(P), (3.56) 

instead of p.47p . where grad(0/i(P) is the Riemannian gradient of pA at P. We will 
show that P (ti 5, where S is now the image of the zero section in TAJ, i.e. 

ImT(_4_p)P + Tp(^^p)S' = Tp(^_p)T7W, (3.57) 

for all {A,P) &V X M with grad/9A(P) = 0. As in the proof of Theorem 13.131 we 
rewrite the transversality condition p. 571) in local coordinates, i.e. 

lmDF{A,Q)+i:pM x {0} = TpX x TpAl, (3.58) 
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where 

F ■.= ^PoFo{idx tp-^) -.V xW ^W X TpM. (3.59) 

Here, ip : U ^ W C TpM is a chart around P with (p~^{0) = P and Dip~''-{0) = id 
and tp : 7r~^{U) -^ W x TpM C TpM x TpM is the corresponding induced chart 
around F{A, P). With this choice of charts, we obtain 

F{A,x)^{x,Vpa{x)), (3.60) 

where pA '■= PA ° f~^ : W -^ R. Since A i-^ Pa{0) is hnear, one has 

DF{A, 0)iX,0 = (e,Vpx(0) + d2/^(0)^) . (3.61) 

Thus, condition p.58p holds if and only if 

ImVp(.)(0) + Imff2/^(0) = TpX. (3.62) 

Finally, we will show that ImVp(.)(0) = TpM which clearly guarantees p.62[) . Let 
^ := (^1, . . . ,^r) £ (Im Vp()(0)) , then we obtain 

= (Vpx(0),0 = dpxm = dpx{P)^ 

f S^ \ (3.63) 

for all X G tiec^. Notice, that the equality (ipx(O)C — dpx{P)£, follows from 
1)^-1(0) =id. Therefore, 

r 

^Pi ® ■ ■ ■ ® ij ® ■ ■ ■ ® Pr ^ Q (3.64) 

and this holds if and only if ^i = 0, . . . , ^^ = 0, since alls summands in p.64p are 
orthogonal to each other. Thus, we have proved that F rtl TpM x {0} and hence 
F &[ S. From the Theorem 13.131 it follows immediately that the critical points of the 
generalized Rayleigh-quotient are generically non-degenerate. D 

Unfortunately, for best multilinear rank tensor approximation and subspace clus- 
tering problems, we cannot conclude from Corollary 13.141 that the critical points of 
Pa are generically non-degenerate. In these cases, the resulting matrices A are re- 
stricted to a thin subset of f)er^ and thus the genericity statement with respect [)er^ 
in Corollary 13.141 does not carry over straight-forwardly. 



4. Numerical Methods. Exploiting the geometrical structure of the constraint 
set Gr^'^(m, n), we develop two numerical methods, a Newton-like and a conjugated 
gradient algorithm, for optimizing the generalized Rayleigh-quotient pA^ with A G 
f)erjv, N := nin2 ■ ■ -rir. 
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4.1. Newton-like algorithm. The intrinsic Riemannian Newton algorithm is 
described by means of the Levi-Civita connection taking iteration steps along geodesies 
[njm]. Sometimes geodesies are difficult to determine, thus, here we are interested in a 
more general approach, which introduces the Newton iteration via local coordinates, 
see [H [131 HZ]- More precisely, we follow the ideas in [13] and use a pair of local 
coordinates on Gr^''(m, n), i.e. normal coordinates and QR-coordinates. 

Recall that, a local parametrizatiori^ of Gr^'^(m, n) around a point P :— (Pi , . . . , Pr) 
is a smooth map 

fip : TpGr^''(m,n) ^ Gr^''(m,n) 

satisfying the additional conditions 

/ip(0)=P and I^MpIO) =idTpGrX'-(m,n)- (4-1) 

Riemannian normal coordinates are given by the Riemannian exponential map 

^^'^P(^) = (e-[«i^^ilPie[«i^^il,...,e-[«-^'-lp,e[«-^-l), (4.2) 

while QR-type coordinates are defined by the QR-approximation of the matrix expo- 
nential, i.e. 

t^fiO - {[Xl]l Pi [Xi]q, . . . , [Xr]^ Pr [^,-]q)- (4.3) 

Here [Xj]q denotes the Q— factor from the unique QR decomposition of Xj := / + 

Now. let P* :— {P*, . . . , P*) € Gr^'^(m, n) be a critical point of p^. Ghoose 
P G Gr^'^(ni, n) in a neighborhood of P* and perform the following Newton-like 
iteration 

pncw _ ^QR(^)^ (4.4) 

where ^ := (^i, . . . , ^,,) G TpGr^''(m, n) is a solution of the Newton equation 

Hp,(P)e = -gradp^(P). (4.5) 

Replacing the objects in (|4.5p by their explicit form computed in the previous section, 
we get the following Newton equation; 

r 

- adp^. ad;j^ ^j + Yl ^d^^. ^A,j (Pi ,...,/„,,..., Cfc ,•••, -Pr) = -ad^^. A, , (4.6) 

for aU j = 1, . . . , r. As mentioned before, let Aj := \I>^j(Pi, . . . , /„^. , . . . , Pr). Solving 
this system in the embedding space f)er„ x ■ • • x f)er„ requires a higher number of 
parameters than necessary. However, exploiting the particular structure of the tangent 
vectors 



0-e,oe: = e 



jSj^j 



Z, 

A 



ej, (4.7) 



'Clearly, one can define a local parametrization more generally, i.e. without requiring the second 
part of \A.\\ . 
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allows us to solve (j4.6|) with the mininium number of parameters equal to the dimen- 
sion of Gr^'^(m, n). Thus, by multiplying (|4.6p from the left with 8j and from the 
right with 0|, we obtain an equation in the variables Zj G C™^^^"^^™^^ i.e. 

^'^Z,~Z^'l- Y. ^A^u)^^'-. (4.8) 

where the terms vp'. v]/'' vj/'" and $j(Zfc) are computed in the following. Let 

e, = [ U, V, ] gSU„^, (4.9) 

where \Jj and Vj are Uj x rtij and rij x (jij — nij) matrices, respectively. Then, 

*;. =t/]l,C/„ ^] = V}A,V,, ^'J' = u]A,Vj. (4.10) 

For expressing $j(Zfc) with j < k , we introduce the multilinear operators ^ A,j,k '■ 
f)er„^ X • ■ • X f)er„^ — !> C"^"''^"j"'= defined in a similar way as 'i'A.j by 

tr (a(Xi0- • ■(E)XjS(g)- ■ -(^XkT®- ■ •(g)X^)j = tr (^^a^jA^I: ■■■, X^)(S'®T)) , (4.11) 

for all S G C"j ^"^ and T G C"* xnfc , Por convenience, we will use the following shortcut 

Ajk :- *A,j,fe(-Pi, . ..,/„,,..., /„., . . . , Pr) e f)er„^..„,. (4.12) 

Furthermore, we partition the matrix Ajk into block form 

A,k =: [^st]:U, (4.15) 

where each Sst is an Uk x n^ matrix. Then, the linear map $j : C^kxl"^-™*:) _>. 

Zfe H^ $,(Zfc) := (7] [tr(f/,!a,,FfcZ| + Zfe^Jal^t/fc)] "'^^V,. (4.16) 

Finally, the complete Newton-like algorithm for the optimization of pA on Gr^'^(m, n) 
is given by Algorithm 1. 

Suggestions for implementation, (a) For an arbitrary matrix A G f)erjv, the 
computation of Aj and Ajk is performed according to formula p.29p . This can be 
simplified in the case of the applications described in Section 3.3. 
Case 1. liA^vv^ with w = vec(y^), A£ C"!^'-^"'-, then 

^j = ^0") • ^0) e fiec„^. and A^k = C^j^k) ■ Cj,- ^ G f)et„^„,, (4.17) 

where Btj\ and C/j^^ are the j— th mode and respectively (j, fc)— th mode matrices 
of the tensors B = ^ Xi L/J X2 • • ■ x^. /„^ x^+i • ■ • x^ [/^ and C ~ AxiUl X2 ■ ■ ■ ^j 
Inj Xj+i ■ • • Xfc /„^ Xfc+i ■ • • x^ 0, respectively. 

L 

Case 2. If A = ^ xixj • • ■ (g) Zix]' with a;; G C" and L G N, then 



;=r 



r times 



^J=E(nil^»^'ll')^'^i and A,k=J2i n ll^.2:if)x,a:|«.a;zx|. (4.18) 



/=1 i=l 1=1 i=l 
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ALGORITHM 1. 



N-like algorithm 



Step 1. Starting point: Given P — (Pi, . . . , P^) G Gr^''(ni, n) choose 

such that Pj = GjIIje], for j = 1, . . . , r. 

Step 2. Stopping criterion: \\gra,dp_^{P)\\/pA{P) < £■ 

Step 3. Newton direction: Set 



and compute ^'p *", *"' as in (|i?TUl) . for j = 1, . . . , r. 
Set 

^jfe := ^A j,fc (-Pi , • ■ • 7 -^n, 1 ■ • ■ : Ink : • ■ • 7 -Pr) 



and compute ^j{Zk) as in (|4.15|) and (|4.16p . for j, k = l,...,r, with j < k. 
Furthermore, set ^k{Zj) = $j(Zfc)^^ and solve the Newton equation 



"^'jZ.-Z,^'^- Y. $,(Zfe) = vI/ 



(4.13) 



to obtain Zj e c™. x(".-™.)^ for j = 1, . . . , r. 
Step 4. QR-updates: 



ej'™ = Bj 



z] /„ 



and P"™ = BjUjO 



new' 
j 



(4.14) 



Q 



for all j = 1, . . . , r. Here [ ]q refers to the Q part from the QR factorization. 
Step 5. Set P := P"'^^, 6 := 9"™ and go to Step 2. 



(b) To solve the system (|4.8p . one can rewrite it as a linear equation on M^ (d is the 
dimension of Gr^'^(m, n)) using matrix Kroneckcr products and vec— operations, then 
solve this by any linear equation solver. 

(c) The computation of geodesies on matrix manifolds usually requires the matrix 
exponential map, which is in general an expensive procedure of order O(n^). Yet, 
for the particular case of the Grassmann manifold Grm „, Gallivan et.al. jlO| have 
developed an efficient method to compute the matrix exponential, reducing the com- 
plexity order to 0{nw?') (m < n). Our approach, however, is based on a first order 
approximation of the matrix exponential c^'^->^> followed by a QR-decomposition to 
preserve orthogonality/unitarity. Explicitly, it is given by 



I-ni ^Z 

zt / 



w 



-stpi-i 





p-1 





w\ 



(4.19) 
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where Z ^ XSrt with X e SU„, Y e C^"-™)^™, y}y, = /„. and S G C™^™ 
diagonal. Furthermore, 

W := 



Xt 

y y 



e SU„, Z? := v//™ + StS, (4.20) 



where [1" y ] G SU„-m is an unitary completion of Y . The computational complexity 
of this QR-factorization is of order 0{{n — m)m'^). 

(d) The convergence of the algorithm is not guaranteed for arbitrary initial conditions 
P £ Gr^'^(m, n) and even in the case of convergence the limiting point need not 
be a local maximizer of the function. To overcome this, one could for example test 
if the computed direction is ascending, else take the gradient as the new direction. 
Furthermore, one can make an iterative line-search in the ascending direction. 

In the following theorem we prove that the sequence generated by Algorithm 1 
converges quadratically to a critical point of the generalized Rayleigh-quotient pA if 
the sequence starts in a sufficiently small neighborhood of the critical point. 

Theorem 4.1. Let A e f)ec^ and P* e Gr^''(ni, n) be a non-degenerate critical 
point of the generalized Rayleigh-quotient pA, then the sequence generated by the N-like 
algorithm converges locally quadratically to P* . 

Proof. For the critical point P* G Gr^'"(m, n), the Riemannian coordinates 
(|42l) and the QR- coordinates (|43| satisfy the condition Dp'pfifi) = D/x^?(0) = 
idTj,.Gr>^''(m,n)- Thus, according to Theorem 4.1. from [l^ there exists a neighbor- 
hood V C Gr^''(m, n) such that the sequence of iterates generated by the N-like 
algorithm converges quadratically to P* when the initial point P is in V. D 

4.2. Riemannian conjugated gradient algorithm. The quadratic conver- 
gence of the Newton-like algorithm has the drawback of high computational complex- 
ity. Solving the Newton equation (|4.8p yields a cost per iteration of order 0{d^), 
where d is the dimension of Gr^''(m, n). In what follows, we offer as an alternative to 
reduce the computational costs of the Newton-like algorithm by a conjugated gradient 
method. The linear conjugated gradient (LCG) method is used for solving large sys- 
tems of linear equations with a symmetric positive definite matrix, which is achieved 
by iteratively minimizing a convex quadratic function x^ Ax. The initial direction do 
is chosen as the steepest descent and every forthcoming direction dj is required to be 
conjugated to all the previous ones, i.e. d^Adk = 0, for all A: = 0, • • ■ , j — 1. The exact 
maximum along a direction gives the next iterate. Hence, the optimal solution is found 
in at most n steps, where n is the dimension of the problem. Nonlinear conjugated 
gradient (NCG) methods use the same approach for general functions / : M" — > M, 
not necessarily convex and quadratic. The update in this case reads as 

xnow^x + ad and d"'^" = -V/(a;"'=") + /3d, 

where the step-size a is obtained by a line search in the direction d 

a = argmin/(a; + td) (4.21) 

t 

and /3 is given by one of the formulas: Fletcher-Reeves, Polak-Ribiere, Hestenes- 
Stiefel, or other. We refer to [29] for the generalization of the NCG method to a 
Riemannian manifold. For the computation of the step-size along the geodesic in 
direction ^. an exact line search — as in the classical case — is an extremely expensive 
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procedure. Therefore, one conimonly approximates (j4.2ip by an Armijo-rule, which 
ensures at least that the step length decreases the function sufficiently. We, however, 
have decided to compute the step-size by performing a one-dimensional Newton-step 
along the geodesic, since in the neighborhood of a critical point one Newton step can 
lead very close to the solution. Therefore, at P = (Pi,--- ,Pr) G Gr^'^(m, n) the 
step-size in direction ^ — (^i, - - - , ^r) G TpGr^'"(m, n) is given by 



(PAO7)"(0) 



(4.22) 



where -f : I ^ Gr^'"(m, n) is the unique geodesic through P in direction ^. 

Let e := (Gi,--- ,9,-) be such that OfePfeO^ = Efc. Furthermore, let P"™ := 
^pncw ... pnow-) (jgnote the updated point in Gr^''(m,n) via the QR-coordinates 
as in (j4.14p . For the computation of the new direction, a "transport" of the old 
direction ^ = (^i, - - - ,^r) from TpGr^'"(m, n) to the tangent space TpnowGr^'^(m, n) 
is required. We use the following approximation for the paralle transport of ^ along 
the geodesic through P in direction ^ 



?. ^0 





\ ^ 


Z, 1 




new 






anew 


J 


i^ 





i 



where 6"°^ = 9^ 



/„ 



-Z. 



''''j ' 



(4.26) 



for all j = 1 , . . . , r. 

The complete Riemannian conjugated gradient is presented as Algorithm 2. 

It is recommended to reset the search direction to the steepest descent direction 
after d iterations, i.e. Z^'^'" := —5^°^, fc = 1, . . . , r, where d refers to the dimension of 
the manifold. For the maximization of the generalized Rayleigh-quotient the initial 
direction is Zk = gu and the update Z^^^ = gf^ + (i Zk- 

The convergence properties of the NCG methods are in general difficult to ana- 
lyze. Yet, under moderate supplementary assumptions on the cost function one can 
guarantee that the NCG converges to a stationary point [H]. It is expected that the 
proposed Riemannian conjugated gradient method has properties similar to those of 
the NCG. 

4.3. Numerical experiments. In this section we run several numerical exper- 
iments suitable for the applications mentioned in Section 3.2, i.e. best rank approxi- 
mation for tensors and subspace clustering, to test the Newton-like (N-like) and Rie- 
mannian conjugated gradient (RCG) algorithms. The algorithms were implemented 
in MATLAB on a personal notebook with 1.8 GHz Intel Core 2 Duo processor. 



4.3.1. Best multilinear rank-(mi, . . . , rrir) tensor approximation.. To test 
the performance of our algorithms we have considered several examples of large size 
tensors of order 3 and 4 with entries chosen from the standard normal distribution 
and estimated their best low-rank approximation. We have started with a truncated 
HGSVD ([20]) and performed several HGGI iterates before we run our N-like and 
RCG algorithms. Depending on the size of the tensor, the number of HGGI iterations 
necessary to reach the region of attraction of a stationary point P* G Gr^'"(m, n), 
ranges from 10 to 100. As stopping criterion we have chosen that the relative norm 
of the gradient j| grad (P)||/p^(P) is approximately 10""'^'^. 

Computational complexity. The computational complexity of the N-like 
method is determined by the computation of the Hessian and the solution of the New- 
ton equation (|4.13l) . Thus, for the best rank-(?7i,, m, m) approximation of a n x n x n 
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ALGORITHM 2. RCG algorithm 



Step 1. Starting point: Given P ~ (Pi, . . . , P^) G Gr^''(ni, n) choose 

such that Pj = GjIIje], for j = 1, . . . , r. 
Initial direction: Set 

Aj ■.^^A.jiPl,-.-Jn,,--.,Pr), 

compute ^'-j ^", ^"' as in (|4.10p and take the steepest descent direction 

for J = 1, . . . , r. Denote Z := (Zi, . . . , Z^), .g := (.gi, . . . ,gr)- 
Step 2. Stopping criterion: ||grad (P)|[/p^(P) < e. 
Step 3. QR-updates: 

Qncw ^ 0^. 



al„ij aZj 



> ^.=0.n,0--'\ (4.23) 

Q 



with the step-size given by a = —a/{b + c), where 



r— 1 r 

c := E E Pa(A, ..., 0, ...,&,■•■ , Pr), 



for j = 1, . . . , r. The tangent vectors ^j are given in (|4.7p . 

Step 4. Set P := F"™ and 9 := 8"™. 

Step 5. New direction: Update ^j, ^'', ^'" as in (|4.10p and compute the new 



direction 



^ncw ^ _^^ncw ^ ^ z ^ , gf"" := *"', (4.24) 



for J = 1, . . . , r. Here, /3 is given by the Polak-Ribiere formula 



/now now _ \ 

P = ^ ^^— ^ (4.25) 

(5,5) 



Step 6. Set 5 := y"""", Z := Z"<=" and go to Step 2. 



tensor, the computation of the Hessian is dominated by tensor-matrix multiplications 
and is of order 0{n^m). Solving the Newton equation by Gaussian elimination gives a 
computational complexity of order 0{m^{n — m)^), i.e. the dimension of the manifold 
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to the power of three. For the computational costs of the RCG method we have to 
take into discussion only tensor-matrix multiplications, which give a cost per RCG 
iteration of order 0{n^m). 

Experimental results and previous v^rork. The problem of best low-rank 
tensor approximation has enjoyed a lot of attention recently. Apart from the well 
known higher order orthogonal iterations - HOOI ([21]), various algorithms which 
exploit the manifold structure of the constraint set have been developed. We refer to 
[8j [18] for Newton methods, to [28] for quasi-Newton methods and to [17] for conju- 
gated gradient and trust region methods on the Grassmann manifold. Similar to the 
Newton methods in [8j [18] , our N-like method converges quadratically to a stationary 
point of the generalized Rayleigh-quotient when starting in its neighborhood. 

We have compared our algorithms with the existing ones in the literature: quasi- 
Newton with BFGS, Riemannian conjugated gradient method which uses the Armijo- 
rule for the computation of the step-size (CG-Armijo), and HOOI. The algorithms 
were run on the same platform, identically initialized and with the same stopping 
criterion. For the BFGS quasi-Newton and limited memory quasi-Newton (L-BFGS) 
methods we have used the code available in [251. 
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Fig. 4.1. Convergence for multilinear rank tensor approximation: number of iterations versus 
the relative norm of the gradient \\ grad {P")\\/pa{P'^) at a logarithmic scale. Left: 100 X 100 X 100 
tensor approximated by a rank-{5, 5, 5) tensor. Right: 100 X 150 X 200 tensor approximated by a rank- 
(15, 10, 5) tensor. 



Fig. 14. II shows convergence results for two large size tensors 100 x 100 x 100 and 
100 X 150 X 200 approximated by rank-(5, 5, 5) andrank-(15, 10, 5) tensors, respectively. 
In Fig. 14.21 we plot the convergence behavior of the RCG method for the best rank- 
(10, 10, 10) approximation of a 200 x 200 x 200 tensor (left) and for the best rank- 
(5, 5, 5, 5) approximation of a 50 x 50 x 50 x 50 tensor. Due to the limited memory 
space, we were not able to run the N-like and BFGS quasi-Newton algorithms for the 
example on the left. Yet it was still possible to run RCG, L-BFGS, CG-Armijo and 
HOOI. 

In Table 14.11 we display the average CPU times necessary to compute a low rank 
best approximation for tensors of different sizes and orders by N-like, RCG, BFGS 
and L-BFGS quasi-Newton methods. We have performed 100 runs for each example. 

Resume. First we mention that there is no guarantee that the N-like and 
RCG iterations converge to a local maximizer of the generalized Rayleigh-quotient. 
However, in the examples presented in Fig l4.1l and Fig l4.2l the limiting points are 
local maximizers. As the numerical experiments have shown, the N-like method has 
the advantage of fast convergence. Unfortunately, for large scale problems, the N- 
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Fig. 4.2. Convergence for multilinear rank tensor approximation: number of iterations versus 
the relative norm of the gradient \\ grad {P"')\\/pa{P^) at a logarithmic scale. Left: 200 X 200 X 200 
tensor approximated by a rank-{10, 10, 10) tensor. Right: 50 X 50 X 50 X 50 tensor approximated by 
a rank-{5, 5, 5, 5) tensor. 

Table 4.1 
Average CPU Time 



Tensor size and rank 


N-like 


RCG 


BFGS 


L-BFGS 


50x 50x 50, rank-(7,8,5) 


2s 


6s 


24 s 


13 s 


100 X 100 X 100, rank-(5,5,5) 


70s 


75 s 


150 s 


94 s 


200x200x200, rank-(5, 5, 5) 


- 


11 min 


- 


14 min 


50 X 50 X 50 X 50, rank-(5, 5, 5, 5) 


- 


9 min 


11 min 


- 



like algorithm can not be applied, as mentioned before. Even when it is possible to 
apply N-likc algorithm, it needs a large amount of time per iteration. As an example, 
for the best rank-(10, 10, 10) of a 180 x 180 x 180 tensor, one N-like iteration took 
three minutes. Related algorithms which explicitly compute the Hessian and solve the 
Newton equation, such as [51 118| , and those which approximately solve the Newton 
equation such as the trust region method [T7] , face the same difficulty for large scale 
problems. On the other hand, the low cost iterations of the RCG method makes it 
a good candidate to solve large size problems. The convergence rate is comparative 
to that of the BFGS quasi-Newton method in [5S] , but at much lower computational 
costs. Our experiments exhibit the shortest CPU time for the RCG method. In the 
examples in which the tensor was a small perturbation of a low-rank tensor, the RCG 
algorithm exhibits quadratic convergence. 



4.3.2. Subspace Clustering. The experimental setup consists in choosing r 
subspaccs in R^ (r = 2, 3 and 4) and collections of 200 randomly chosertiJ points on 
each subspace. Then, the sample points are perturbed by adding zero-mean Gaussian 
noise with standard deviation varying from 0% to 5% in the different experiments. 
Now, the goal is to detect the exact subspaces or to approximate them as good as 
possible. For this purpose, we apply our N-like and RCG algorithms to solve the 
associated optimization task, cf. Section 3.1. The error between the exact subspaces 



t The points have been generated by fixing an orthogonal basis within the subspaces and choosing 
corresponding coordinates randomly with a uniform distribution over the interval [—5,5]. 
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and the estimated ones is measured as in 1331. i.e. 



^" -zYl arccos( -2 | tr(PjP,) 



j=i 



(4.27) 



where Pj is the orthogonal projector corresponding to the exact subspace and Pj the 
orthogonal projector corresponding to the estimated one. 





Fig. 4.3. Left: Data points drawn from the union of two subspaces of dimension 2 (through 
the origin) of M.^ . Right: Data points from the left figure slightly perturbed by zero mean Gaussian 
noise with 5% standard deviation. 



It can be easily checked that in the case of unperturbed data there is a unique 
non-degenerate minimizer of pA, and it yields the exact subspaces. Thus, we expect 
that for noisy data the global minimizer still gives a good approximation. Since pA 
has many local optima, for an arbitrary starting point our algorithms can converge to 
stationary points which lead to a significant error between the exact subspaces and 
their approximation. Thus, in what follows, we briefly describe a method (PDA, see 
below) for computing a suitable initial point which guarantees the convergence of our 
algorithms towards a good approximation of the exact subspaces in our numerical 
experiment; 

The Polynomial Differential Algorithm (PDA) was proposed in [33] . It is a purely 
algebraic method for recovering a finite number of subspaces from a set of data points 
belonging to the union of these subspaces. From the data set finitely many homoge- 
neous polynomials are computed such that their zero set coincides with the union of 
the sought subspaces. Then, an evaluation of their derivatives at given data points 
yields successively a basis of the orthogonal complement of subspaces one is interested 
in. For noisy data, a slightly modified version of PDA [33] yields an approximation of 
the unperturbed subspaces. This "first" approximation turned out to be a good start- 
ing point for our iterative algorithms which significantly improved the approximation 
quality. 

For each noise level we perform 500 runs of the N-likc and Local-CG algorithms 
for different data sets and compute the mean error between the exact subspaces and 
the computed approximations. As a preliminary step, we normalize all data points, 
such that no direction is favored. 

In Fig. l4.3[ 400 randomly chosen data points which lie exactly in the union of two 
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2-dimcnsional subspaces of R^ (left) and their perturbcq^ images (right) are depicted. 
Moreover, the two plots display the exact subspaces (left) as well as the ones computed 
by our N-like algorithm (right). The error between the exact subspaces and our 
approximation is ca. 2°, whereas the error for the PDA approximation is ca. 5°. 




2 3 

Noise level (% 



20 25 30 
Iterations 



Fig. 4.4. Left: The mean error for noise levels from 0% to 5% and different number 
of subspaces. The disconnected symbols refer to the initial error (PDA) and the corresponding 
continuous lines refer to the error estimated by our algorithms. Right: Convergence of N-like 
and RCG for subspace clustering: number of iterations versus the relative norm of the gradient 
II grad (P")||/pyi(P") at a logarithmic scale. Data points from 3 and resp. 4 subspaces perturbed 
with 5% Gaussian noise. Average CPU time: ca. 0.4 and ca. 2 seconds for the N-like and RCG 
algorithm, respectively (1.8 GHz Intel Core 2 Duo processor). 

In Fig. 14.41 we plot the mean error (left) for different noise levels and different 
number of subspaces. We have included also the mean error for the starting point of 
our algorithms, i.e. for the PDA approximation. On the right we demonstrate the fast 
convergence rate of the N-like and RCG algorithms for the case of 3 and, respectively, 
4 subspaces. 

Resume. Our numerical experiments have proven that (i) the minimization 
task proposed in Section 3 is capable to solve subspace detection problems and (ii) 
our numerical algorithms initialized with the PDA starting point yield an effective 
method for computing a reliable approximation of the perturbed subspaces. How the 
approximation of the perturbed subspaces varies when the noise in the data follows 
some law of distribution, is the subject of future investigation. 

5. Appendix. 

Here we provide a proof of Proposition l2.3[ which states that there exists a global 
Riemannian isometry ip between Gr®'^(in, n) and Gr^'^(m, n). 

Proof. The surjectivity of ip is clear from the definition of Gr®'^(m, n). To prove 
the injectivity of (p we use induction over r. Choose {Pi, ..,Pr), {Qi, . ■ ■ ,Qr) € 
Gr^''(m, n) such that Pi <^ ■ ■ ■ ® Pr = Qi ® ■ ■ ■ ^ Qr, i-c 



,Pr = PijQr for all i, j, 



(5.1) 

•®(5r-i, respectively. 
Thus it exists 7 G C such that P^ = jQr- Since Pr and Qr have only and 1 as 
eigenvalues it follows that 7=1 and Pr = Qr- Therefore, Pi®- • -^Pr = Qi®- ■ ■®Qr 



where aij and Pij are the entries of Pi ( 



• ®Pr-i and Qi ( 



to 



aussian noise 



with 5% standard deviation 
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implies that 

Pi (»•■•«) Pr-l =Ql «)■•■«) Qr-l (5.2) 

and the procedure can be repeated until we obtain Pj = Qj, for all j = 1, . . . , r. Thus 
the injectivity of tp is proven. So (/3 is a continuous bijective map with continuous 
inverse ip~^ due to the compactness of Gr^'"(m, n). Moreover, the map (p is smooth 
since the components of Pi (g) • • • O P,- are polynomial functions. Let P := (Pi , . . . , P^) 
and P := Pi g) • • • (g) P^. Consider the tangent map of (f at P, i.e. 



DifiP) : TpGr^'^(m,n) ^ TpGr®'^(m, n), 

r 
(Cl, • ■ • , Cr) ^ XI -^1 ® ■ ■ ■ ® ^J ® ■ ■ ■ ® ^^■ 



(5.3) 



With the inner products (|2.1ip and (|2.12p defined on fier^v and ()er„^ x • • • x (]et„^, 
respectively, one has 

r r 

3 = 1 k=l,k^j 

This implies that the tangent map Dip{P) is a linear isomctry. Thus, it is invcrt- 
iblc and therefore ip is a local diffeomorphism. Moreover, since ip is bijective it is a 
global diffeomorphism, giving thus a global Riemannian isomctry when the metric on 
Gr='''(m,n) is defined by ^A^. D 
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